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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Covacci, Antonello 
Bugnol i , Mas s imo 
Telford, John 
Macchia, Giovanni 
Rappuol i , Rino 

(ii) TITLE OF INVENTION: Helicobacter Pylori Proteins Useful 
for Vaccines and Diagnostics 

(iii) NUMBER OF SEQUENCES: 7 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Chiron Corporation 

□ (B) STREET: 4560 Horton Street 

(C) CITY: Emeryville 

fjj (D) STATE: California 

ffi (E) COUNTRY: USA 

p (F) ZIP: 94608-2916 

COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 

CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/471,491 

(B) FILING DATE: 06-JUNE-1995 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: McClung, Barbara G. 

(B) REGISTRATION NUMBER: 33,113 

(C) REFERENCE /DOCKET NUMBER: 0316*003 

. (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (510) 601-2708 

(B) TELEFAX: (510) 655-3542 



(2) INFORMATION FOR SEQ ID NO : 1 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
GCAAGCTTAT CGATGTCGAC TCGAGCT 27 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 960 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

O 

rj (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

AA^I^GAAAG GAAGAAAATG GAAATACAAC AAACACACCG CAAAATCAAT CGCCCTCTGG 60 

TT^^TCTCGC TTTAGTAGGA GCATTAGTCA GCATCACACC GCAACAAAGT CATGCCGCCT 120 

TTTTCACAAC CGTGATCATT CCAGCCATTG TTGGGGGTAT CGCTACAGGC ACCGCTGTAG 180 

GAiScGTCTC AGGGCTTCTT AGCTGGGGGC TCAAACAAGC CGAAGAAGCC AATAAAACCC 24 0 

CAcMtAAACC CGATAAAGTT TGGCGCATTC AAGCAGGAAA AGGCTTTAAT GAATTCCCTA 3 00 

AC^GGAATA CGACTTATAC AGATCCCTTT TATCCAGTAA GATTGATGGA GGTTGGGATT 3 60 

GGGGGAATGC CGCTAGGCAT TATTGGGTCA AAGGCGGGCA ACAGAATAAG CTTGAAGTGG 420 

ATATGAAAGA CGCTGTAGGG ACTTATACCT TATCAGGGCT TAGAAACTTT ACTGGTGGGG 4 80 

ATTTAGATGT CAATATGCAA AAAGCCACTT TACGCTTGGG CCAATTCAAT GGCAATTCTT 540 

TTACAAGCTA TAAGGATAGT GCTGATCGCA CCACGAGAGT GGATTTCAAC GCTAAAAATA 600 

TCTCAATTGA TAATTTTGTA GAAATCAACA ATCGTGTGGG TTCTGGAGCC GGGAGGAAAG 660 

CCAGCTCTAC GGTTTTGACT TTGCAAGCTT CAGAAGGGAT CACTAGCGAT AAAAACGCTG 72 0 

AAATTTCTCT TTATGATGGT GCCACGCTCA ATTTGGCTTC AAGCAGCGTT J\AATTAATGG 780 

. GTAATGTGTG GATGGGCCGT TTGCAATACG TGGGAGCGTA TTTGGCCCCT TCATACAGCA 84 0 

CGATAAACAC TTCAAAAGTA ACAGGGGAAG TGAATTTTAA CCACCTCACT GTTGGCGATA 90 0 
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AAAACGCCGC 


TCAAGCGGGC 


! ATTATCGCTA 


. ATAAAAAGAC 


TAATATTGGC 


ACACTGGATT 


960 


TGTGGCAAAG 


CGCCGGGTTA 


AACATTATCG 


CTCCTCCAGA 


AGGTGGCTAT 


AAGGATAAAC 


1020 


CCAATAATAC 


CCCTTCTCAA 


AGTGGTGCTA 


AAAACGACAA 


AAATGAAAGC 


GCTAAAAACG 


1080 


ACAAACAAGA 


GAGCAGTCAA 


AATAATAGTA 


ACACTCAGGT 


CATTAACCCA 


CCCAATAGTG 


1140 


CGCAAAAAAC 


AGAAGTTCAA 


CCCACGCAAG 


TCATTGATGG 


GCCTTTTGCG 


GGCGGCAAAG 


1200 


ACACGGTTGT 


CAATATCAAC 


CGCATCAACA 


CTAACGCTGA 


TGGCACGATT 


AGAGTGGGAG 


1260 


GGTTTAAAGC 


TTCTCTTACC 


ACCAATGCGG 


CTCATTTGCA 


TATCGGCAAA 


GGCGGTGTCA 


1320 


ATCTGTCCAA 


TCAAGCGAGC 


GGGCGCTCTC 


TTATAGTGGA 


AAATCTAACT 


GGGAATATCA 


1380 


CCGTTGATGG 


GCCTTTAAGA 


GTGAATAATC 


AAGTGGGTGG 


CTATGCTTTG 


GCAGGATCAA 


1440 


GCQfGAATTT 


TGAGTTTAAG 


GCTGGTACGG 


• ATACCAAAAA 


CGGCACAGCC 


ACTTTTAATA 


1500 


ACC^TATTAG 


TCTGGGAAGA 


TTTGTGAATT 


TAAAGGTGGA 


TGCTCATACA 


GCTAATTTTA 


1560 


AAO^TATTGA TACGGGTAAT 


GGTGGTTTCA 


ACACCTTAGA 


TTTTAGTGGC 


GTTACAGACA 


1620 


AACfjfcAATAT 


CAACAAGCTC 


ATTACGGCTT 


CCACTAATGT 


GGCCGTTAAA 


AACTTCAACA 


1680 


TTid^TGAATT 


GATTGTTAAA 


ACCAATGGGA 


TAAGTGTGGG 


GGAATATACT 


CATTTTAGCG 


1740 


AACpTATAGG 


CAGTCAATCG 


CGCATCAATA 


CCGTGCGTTT 


GGAAACTGGC 


ACTAGGTCAC 


1800 


TT':gCTCTGG 


GGGTGTTAAA 


TTTAAAGGTG 


GCGAAAAATT 


GGTTATAGAT 


GAGTTTTACT 


1860 


ATAiCCCTTG 


GAATTATTTT 


GACGCTAGAA 


ATATTAAAAA 


TGTTGAAATC 


ACCAATAAAC 


1920 


TTGCTTTTGG 


ACCTCAAGGA 


AGTCCTTGGG 


GCACATCAAA 


ACTTATGTTC 


AATAATCTAA 


1980 


CCCTAGGTCA 


AAATGCGGTC 


ATGGATTATA 


GCCAATTTTC 


AAATTTAACC 


ATTCAAGGGG 


2040 


ATTTCATCAA 


CAATCAAGGC 


ACTATCAACT 


ATCTGGTCCG 


AGGTGGGAAA 


GTGGCAACCT 


2100 


TAAGCGTAGG 


CAATGCAGCA 


GCTATGATGT 


TTAATAATGA; TATAGACAGC 


GCGACCGGAT 


2160 


TTTACAAACC 


GCTCATCAAG 


ATTAACAGCG 


CTCAAGATCT 


CATTAAAAAT 


A r* 2^ n 2^ 2^ r* a Tri 


0 n o n 


TTTTATTGAA 


AGCGAAAATC 


ATTGGTTATG 


GTAATGTTTC 


TACAGGTACC 


AATGGCATTA 


2280 


GTAATGTTAA 


TCTAGAAGAG 


CAATTCAAAG 


AGCGCCTAGC 


CCTTTATAAC 


AACAATAACC 


2340 


GCATGGATAC 


TTGTGTGGTG 


CGAAATACTG 


ATGACATTAA 


AGCATGCGGT 


ATGGCTATCG 


' 2400 


GCGATCAAAG 


CATGGTGAAC . 


AACCCTGACA 


ATTACAAGTA 


TCTTATCGGT 


AAGGCATGGA 


2460 
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AAAATATAGG 


GATCAGCAAA 


ACAGCTAATG 


GCTCTAAAAT 


TTCGGTGTAT 


TATTTAGGCA 


2520 


ATTCTACGCC 


TACTGAGAAT 


GGTGGCAATA 


CCACAAATTT 


ACCCACAAAC 


ACCACTAGCA 


2580 


ATGCACGTTC 


TGCCAACAAC 


GCCCTTGCAC 


AAAACGCTCC 


TTTCGCTCAA 


CCTAGTGCTA 


2640 


CTCCTAATTT 


AGTCGCTATC 


AATCAGCATG 


ATTTTGGCAC 


TATTGAAAGC 


GTGTTTGAAT 


2700 


TGGCTAACCG 


CTCTAAAGAT 


ATTGACACGC 


TTTATGCTAA 


CTCAGGCGCT 


CAAGGCAGGG 


2760 


ATCTCTTACA 


AACCTTATTG 


ATTGATAGCC 


ATGATGCGGG 


TTATGCCAGA 


AAAATGATTG 


2820 


ATGCTACAAG 


CGCTAATGAA 


ATCACCAAGC 


AATTGAATAC 


GGCCACTACC 


ACTTTAAACA 


2880 


ACATAGCCAG 


TTTAGAGCAT 


AAAACCAGCG 


GCTTACAAAC 


TTTGAGCTTG 


AGTAATGCGA 


2940 


TGATTTTAAA 


TTCTCGTTTA 


GTCAATCTCT 


CCAGGAGACA 


CACCAACCAT 


ATTGACTCGT 


3000 


TCGSCAAACG 


CTTACAAGCT 


TTAAAAGACC 


AAAAATTCGC 


TTCTTTAGAA 


AGCGCGGCAG 


3060 


AAGTOGTTGTA 


TCAATTTGCC 


CCTAAATATG 


AAAAACCTAC 


CAATGTTTGG 


GCTAACGCTA 


3120 


ttgSgggaac 

-S3.- 


GAGCTTGAAT 


AATGGCTCTA 


ACGCTTCATT 


GTATGGCACA 


AGCGCGGGCG 


3180 


tag^cgctta ccttaacggg 


CAAGTGGAAG 


CCATTGTGGG 


CGGTTTTGQA 


AGCTATGGTT 


3240 


ata^ctcttt 


TAATAATCGT 


GCGAACTCCC 


TTAACTCTGG 


GGCCAATAAC 


ACTAATTTTG 


3300 


GCGltsTATAG 


CCGTATTTTT 


GCCAACCAGC 


ATGAATTTGA 


CTTTGAAGCT 


CAAGGGGCAC 


3360 


TAGftbAGCGA 


TCAATCAAGC 


TTGAATTTCA 


AAAGCGCTCT 


ATTACAAGAT 


TTGAATCAAA 


3420 


GCI^'rCATTA 


CTTAGCCTAT 


AGCGCTGCAA 


CAAGAGCGAG 


CTATGGTTAT 


GACTTCGCGT 


3480 


TTTTTAGGAA 


CGCTTTAGTG 


TTAAAACCAA 


GCGTGGGTGT 


GAGCTATAAC 


CATTTAGGTT 


3540 


CAACCAACTT 


TAAAAGCAAC 


AGCACCAATC 


AAGTGGCTTT 


GAAAAATGGC 


TCTAGCAGTC 


3600 


AGCATTTATT 


CAACGCTAGC 


GCTAATGTGG 


AAGCGCGCTA 


TTATTATGGG 


GACACTTCAT 


3660 


ACTTCTACAT 


GAATGCTGGA 


\J ± X ± 1. x\\.,..^^^n\j 










CGTCTTTAAA 


CACCTTTAAA 


GTGAATGCCG 


CTCGCAACCC 


TTTAAATACC 


CATGCCAGAG 


3780 


TGATGATGGG 


TGGGGAATTA 


AAATTAGCTA 


AAGAAGTGTT 


TTTGAATTTG 


GGCGTTGTTT 


3840 


ATTTGCACAA 


TTTGATTTCC 


AATATAGGCC 


ATTTCGCTTC 


CAATTTAGGA 


ATGAGGTATA 


3900 


GTTTCTAAAT 


ACCGCTCTTA 


AACCCATGCT 


CAAAGCATGG 


GTTTGAAATC 


TTACAAAACA 


3960 



(2) INFORMATION FOR SEQ ID NO : 3 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1296 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Glu lie Gin Gin Thr His Arg Lys lie Asn Arg Pro Leii Val Ser 
1 5 . 10 15 

Leu Ala Leu Val Gly Ala Leu Val Ser lie Thr Pro Gin Gin Ser His 
20 , .25 30 

Ala Ala Phe Phe Thr Thr Val lie lie Pro Ala lie Val Gly Gly lie 

35 40 45 

Ala Thr Gly Thr Ala Val Gly Thr Val Ser Gly Leu Leu Ser Trp Gly 
50 55 60 

Leu Lys Gin Ala Glu Glu Ala Asn Lys Thr Pro Asp. Lys Pro Asp Lys 
65 70 75 * 80 

Val Trp Arg lie G-ln Ala Gly Lys Gly Phe Asn Glu Phe Pro Asn Lys 

85 • 90 95 

Glu Tyr Asp Leu Tyr Arg Ser Leu Leu Ser Ser Lys lie Asp Gly Gly 
100 105 110 

Trp Asp Trp Gly Asn Ala Ala Arg His Tyr Trp Val Lys Gly Gly Gin 
115 120 125 

Gin Asn Lys Leu Glu Val Asp Met Lys Asp Ala Val Gly Thr Tyr Thr 

130 135 140 

Leu Ser Gly Leu Arg Asn Phe .Thr Gly Gly Asp Leu Asp Val Asn Met 
145 150 155 160 

Gin Lys Ala Thr Leu Arg Leu Gly Gin Phe Asn Gly Asn Ser Phe Thr 

165 170 175 

Ser Tyr Lys Asp Ser Ala Asp Arg Thr Thr Arg Val Asp Phe Asn Ala 
180 185 190 

Lys Asn lie Ser lie Asp Asn Phe Val Glu lie Asn Asn Arg Val Gly 

195 200 205 

Ser Gly Ala Gly Arg Lys Ala Ser Ser Thr Val Leu Thr Leu Gin Ala 



V 
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210 215 220 

Ser Glu Gly lie Thr Ser Asp Lys Asn Ala Glu lie Ser Leu Tyr Asp 
225 230 235 240 

Gly Ala Thr Leu Asn Leu Ala Ser Ser Ser Val Lys Leu Met Gly Asn 

245 * 250 255 

Val Trp Met Gly Arg Leu Gin Tyr Val Gly Ala Tyr Leu Ala Pro Ser 
260 265 270 

Tyr Ser Thr lie Asn Thr Ser Lys Val Thr Gly Glu Val Asn Phe Asn 
275 280 285 

His Leu Thr Val Gly Asp Lys Asn Ala Ala Gin Ala Gly' lie lie Ala 
290 _ 295 300 

Asn Lys Lys Thr Asn lie Gly Thr Leu Asp Leu Trp Gin Ser Ala Gly 
£□305 310 315 320 

l^fLeu Asn lie lie Ala Pro Pro Glu Gly Gly Tyr Lys Asp Lys Pro Asn 
m 325 330 335 

^flAs'n Thr Pro Ser Gin Ser Gly Ala Lys Asn Asp Lys Asn Glu Ser Ala 
hj 340 345 / 350 

7" Lys Asn Asp Lys Gin Glu Ser Ser Gin Asn Asn Ser Asn Thr Gin Val 

hi 355 360 365 

\. I 

Jllle Asn Pro Pro Asn Ser Ala Gin Lys Thr Glu Val Gin Pro Thr Gin 
370 375 380 

-xUval lie Asp Gly Pro Phe Ala Gly Gly Lys Asp Thr Val Val Asn lie 
'^^385 390 395 400 

Asn Arg lie Asn Thr Asn Ala Asp Gly Thr lie Arg Val Gly Gly Phe 

405 410 415 

Lys Ala Ser Leu Thr Thr Asn Ala Ala His Leu His lie Gly Lys Gly 
420 425 430 

Gly Val Asn Leu Ser Asn Gin Ala Ser Gly Arg Ser Leu lie Val Glu 
435 440 445 

Asn Leu Thr Gly Asn lie Thr Val Asp Gly Pro Leu Arg Val Asn Asn 
450 455 460 

Gin Val Gly Gly Tyr Ala Leu Ala Gly Ser Ser Ala Asn Phe Glu Phe 
465 470 475 480 

Lys Ala Gly Thr Asp Thr Lys Asn Gly Thr Ala Thr Phe Asn Asn Asp 

485 490 495 
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lie Ser Leu Gly Arg Phe Val Asn Leu Lys Val Asp Ala His Thr Ala 
500 505 510 

Asn Phe Lys Gly lie Asp Thr Gly Asn Gly Gly Phe Asn Thr Leu Asp 
515 520 525 

Phe Ser Gly Val Thr Asp Lys Val Asn lie Asn Lys Leu lie Thr Ala 
530 535 ^ 540 

Ser Thr Asn Val Ala Val Lys Asn Phe Asn lie Asn Glu Leu lie Val 
545 550 555 560 

Lys Thr Asn Gly lie Ser Val Gly Glu Tyr. Thr His Phe Ser Glu Asp 

565 570 . 575 

lie Gly Ser Gin Ser Arg lie Asn Thr Val Arg Leu Glu Thr Gly Thr 
580 585 590 

Arg Ser Leu Phe Ser Gly Gly Vai Lys Phe Lys Gly Gly Glu Lys Leu 

595 600 605 

Val lie Asp Glu Phe Tyr Tyr Ser Pro Trp Asn Tyr Phe Asp Ala Arg 
610 615 620 

Asn lie Lys Asn Val Glu lie Thr Asn Lys Leu Ala* Phe Gly Pro Gin 

625 630 635 ' 640 

Gly Ser Pro Trp Gly Thr Ser Lys Leu Met Phe Asn Asn Leu Thr Leu 

645 650 655 

Gly Gin Asn Ala Val Met Asp Tyr Ser Gin Phe Ser Asn Leu Thr lie 

660 665 670 

Gin Gly Asp Phe lie Asn Asn Gin Gly Thr lie Asn Tyr Leu Val Arg 
675 680 685 

Gly Gly Lys Val Ala Thr Leu Ser Val Gly Asn Ala Ala Ala Met Met 

690 695 700 

Phe Asn Asn Asp lie Asp Ser Ala Thr Gly Phe Tyr Lys Pro Leu lie 
705 710 :- 715 720 

Lys lie Asn Ser Ala Gin Asp Leu lie Lys Asn Thr Glu His Val Leu 

725 730 735 

Leu Lys Ala Lys lie lie Gly Tyr Gly Asn Val Ser Thr Gly Thr Asn 
740 745 750 

Gly lie Ser Asn Val Asn Leu Glu Glu Gin Phe Lys Glu Arg Leu Ala 
755 760 765 

Leu Tyr Asn Asn Asn Asn Arg Met Asp Thr Cys Val Val Arg Asn Thr 
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770 775 780 

Asp Asp lie Lys Ala Cys Gly Met Ala lie Gly Asp Gin Ser Met Val 
785 790 795 800 

Asn Asn Pro Asp Asn Tyr Lys Tyr Leu lie Gly Lys Ala Trp Lys Asn 

805 * 810 815 

lie Gly lie Ser Lys Thr Ala Asn Gly Ser Lys lie Ser Val Tyr Tyr 
820 825 830 

Leu Gly Asn Ser Thr Pro Thr Glu Asn Gly Gly Asn Thr Thr Asn Leu 
835 840 845 

Pro Thr Asn Thr Thr Ser Asn Ala Arg Ser Ala Asn Asn Ala Leu Ala 
850 . 855 860 

Gin Asn Ala Pro Phe Ala Gin Pro Ser Ala Thr Pro Asn Leu Val Ala 
Q 865 870 875 880 

lie Asn Gin His Asp Phe Gly Thr lie Glu Ser Val Phe Glu Leu Ala 
S 885 890 895 

^ Asn Arg Ser Lys Asp lie Asp Thr Leu Tyr Ala Asn Ser Gly Ala Gin 
^ 900 905 : 910 

^ Gly Arg Asp Leu Leu Gin Thr Leu Leu lie Asp Ser His Asp Ala Gly 
^ 915 920 925 

SI Tyr Ala Arg Lys Met lie Asp Ala Thr Ser Ala Asn Glu lie Thr Lys 
m 930 935 940 

iO Gin Leu Asn Thr Ala Thr Thr Thr Leu Asn Asn lie Ala Ser Leu Glu 

945 950 955 960 

His Lys Thr Ser Gly Leu Gin Thr Leu Ser Leu Ser Asn Ala Met lie 

965 970 . 975 

Leu Asn Ser Arg Leu Val Asn Leu Ser Arg Arg His Thr Asn His lie 
980 985 990 . 

Asp Ser Phe Ala Lys Arg Leu Gin Ala Leu Lys Asp Gin Lys Phe Ala 
995 1000 1005 

Ser Leu Glu Ser Ala Ala Glu Val Leu Tyr Gin Phe Ala Pro Lys Tyr 
1010 1015 1020 

Glu Lys Pro Thr Asn Val Trp Ala Asn Ala lie Gly Gly Thr Ser Leu 
1025 1030 1035 1040 

Asn Asn Gly Ser Asn Ala Ser Leu Tyr Gly Thr Ser Ala Gly Val Asp 

1045 1050 1055 
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Ala Tyr Leu Asn Gly Gin Val Glu Ala lie Val Gly Gly Phe Gly Ser 
1060 1065 1070 

Tyr Gly Tyr Ser Ser Phe Asn Asn Arg Ala Asn Ser Leu Asn Ser Gly 
1075 1080 1085 

Ala Asn Asn Thr Asn Phe Gly Val Tyr Ser Arg lie Phe Ala Asn Gin 
1090 1095 . 1100 

His Glu Phe Asp Phe Glu Ala Gin Gly Ala Leu Gly Ser Asp Gin Ser 
1105 1110 1115 1120 

Ser Leu Asn Phe Lys Ser Ala Leu Leu Gin Asp Leu Asn Gin Ser Tyr 

1125 1130 1135 

His Tyr Leu Ala Tyr Ser Ala Ala Thr Arg Ala Ser Tyr Gly Tyr Asp 
1140 1145 1150 

Q Phe Ala Phe Phe Arg Asn Ala Leu Val Leu Lys Pro Ser Val Gly Val 
1155 1160 1165 

m Ser Tyr Asn His Leu Gly Ser Thr Asn Phe Lys Ser Asn Ser Thr Asn 
h 1170 1175 1180 



Gin Val Ala Leu Lys Asn Gly Ser Ser Ser Gin His ,Leu Phe Asn Ala 
£ 1185. 1190 1195 1200 

Ser Ala Asn Val Glu Ala Arg Tyr Tyr Tyr Gly Asp Thr Ser Tyr Phe 
y 1205 1210 1215 

Tyr Met Asn Ala Gly Val Leu Gin Glu Phe Ala His Val Gly Ser Asn 
^ 1220 1225 1230 

^ Asn Ala Ala Ser Leu Asn Thr Phe Lys Val Asn Ala Ala Arg Asn Pro 
1235 1240 1245 

Leu Asn Thr His Ala Arg Val Met Met Gly Gly Glu Leu Lys Leu Ala 

1250 ■ 1255 1260 

Lys Glu Val Phe Leu Asn Leu Gly Val Val Tyr Leu His Asn Leu lie 
1265 1270 1275 1280 

Ser Asn lie Gly His Phe Ala Ser Asn Leu Gly Met Arg Tyr Ser Phe 

1285 1290 1295 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5925 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



71 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



CTCCATTTTA 


AGCAACTCCA 


TAGACCACTA 


AAGAAACTTT 


TTTTGAGGCT 


ATCTTTGAAA 


60 


ATCTGTCCTA 


TTGATTTGTT 


TTCCATTTTG 


TTTCCCATGT 


GGATCTTGTG 


GATCACAAAC 


120 


GCTTAATTAT 


ACATGCTATA 


GTAAGCATGA 


CACACAAACC 


AAACTATTTT 


TAGAACGCTT 


180 


CATGTGCTCA 


CCTTGACTAA 


CCATTTCTCC 


AACCATACTT 


TAGCGTTGCA 


TTTGATTTCT 


240 


TCAAAAAGAT 


TCATTTCTTA 


TTTCTTGTTC 


TTATTAAAGT 


TCTTTCATTT 


TAGCAAATTT 


300 


IfGTTAATTG 


TGGGTAAAAA 


TGTGAATCGT 


CCTAGCCTTT 


AGACGCCTGC 


AACGATCGGG 


360 


cQtttttcaa 


TATTAATAAT 


GATTAATGAA 


AAAAAAAAAA 


AATGCTTGAT 


ATTGTTGTAT 


420 


;5Ltgagaatg 


TTCAAAGACA 


TGAATTGACT 


ACTCAAGCGT 


GTAGCGATTT 


TTAGCAGTCT 


480 


t^gacactaa 


CAAGATACCG 


ATAGGTATGA 


AACTAGGTAT 


AGTAAGGAGA 


AACAATGACT 


540 


^^cgaaacca 


TTGACCAACA 


ACCACAAACC 


GAAGCGGCTT 


TTAACCCGCA 


GCAATTTATC 


600 


;^b^TAATCTTC AAGTAGCTTT 


TCTTAAAGTT 


GATAACGCTG 


TCGCTTCATA 


CGATCCTGAT 


660 


cSaaaaccaa 


TCGTTGATAA 


GAACGATAGG 


GATAACAGGC 


AAGCTTTTGA 


AGGAATCTCG 


720 


G§ATTAAGGG 


AAGAATACTC 


CAATAAAGCG 


ATCAAAAATC 


CTACCAAAAA 


GAATCAGTAT 


780 


TTTTCAGACT 


TTATCAATAA 


GAGCAATGAT 


TTAATCAACA 


AAGACAATCT 


CATTGATGTA 


840 


GAATCTTCCA 


CAAAGAGCTT 


TCAGAAATTT 


GGGGATCAGC 


GTTACCGAAT 


TTTCACAAGT 


- 900 


TGGGTGTCCC 


ATCAAAACGA 


TCCGTCTAAA 


ATCAACACCC 


GATCGATCCG 


AAATTTTATG 


960 


GAAAATATCA 


TACAACCCCC 


TATCCTTGAT 


GATAAAGAGA 


AAGCGGAGTT 


TTTGAAATCT 


1020 


GCCAAACAAT 


CTTTTGCAGG 


AATCATTATA 


GGGAATCAAA 


TCCGAACGGA 


TCAAAAGTTC 


1080 


ATGGGCGTGT 


TTGATGAGTC 


CTTGAAAGAA 


AGGCAAGAAG 


CAGAAAAAAA 


TGGAGAGCCT 


.1140 


ACTGGTGGGG 


ATTGGTTGGA 


TATTTTTCTC 


TCATTTATAT 


TTGACAAAAA 


ACAATCTTCT 


1200 


GATGTCAAAG 


AAGCAATCAA 


TCAAGAACCA 


GTTCCCCATG 


TCCAACCAGA 


TATAGCCACT 


1260 


ACCACCACCG 


ACATACAAGG 


CTTACCGCCT 


GAAGCTAGAG 


ATTTACTTGA 


TGAAAGGGGT 


1320 



J——— 



72 



AATTTTTCTA 


AATTCACTCT 


TGGCGATATG 


GAAATGTTAG 


ATGTTGAGGG 


AGTCGCTGAC 


1380 


ATTGATCCCA 


ATTACAAGTT 


CAATCAATTA 


TTGATTCACA 


ATAACGCTCT 


GTCTTCTGTG 


144 0 


TTAATGGGGA 


GTCATAATGG 


CATAGAACCT 


GAAAAAGTTT 


CATTGTTGTA 


TGGGGGCAAT 


1500 


GGTGGTCCTG 


GAGCTAGGCA 


TGATTGGAAC 


GCCACCGTTG 


GTTATAAAGA 


CCAACAAGGC 


1560 


AACAATGTGG 


CTACAATAAT 


TAATGTGCAT 


ATGAAAAACG 


GCAGTGGCTT 


AGTCATAGCA 


1620 


GGTGGTGAGA 


AAGGGATTAA 


CAACCCTAGT 


TTTTATCTCT 


ACAAAGAAGA 


CCAACTCACA 


1680 


GGCTCACAAC 


GAGCATTAAG 


TCAAGAAGAG 


ATCCAAAACA 


AAATAGATTT 


CATGGAATTT 


1740 


CTTGCACAAA 


ATAATGCTAA 


ATTAGACAAC 


TTGAGCGAGA 


AAGAGAAGGA 


AAAATTCCGA 


1800 


ACTGAGATTA 


AAGATTTCCA 


AAAAGACTCT 


AAGGCTTATT 


TAGACGCCCT 


AGGGAATGAT 


1860 


CC^ATTGCTT 


TTGTTTCTAA 


AAAAGACACA 


AAACATTCAG 


CTTTAATTAC 


TGAGTTTGGT 


1920 


AA^GGGGATT 


TGAGCTACAC 


TCTCAAAGAT 


TATGGGAAAA 


AAGCAGATAA 


AGCTTTAGAT 


1980 


AGgSAGAAAA 


ATGTTACTCT 


TCAAGGTAGC 


CTAAAACATG 


ATGGCGTGAT 


GTTTGTTGAT 


2040 


TAgrCTAATT 


TCAAATACAC 


CAACGCCTCC 


AAGAATCCCA 


ATAAGGGTGT 


AGGCGTTACG 


2100 


AATGGCGTTT 


CCCATTTAGA 


AGTAGGCTTT 


AACAAGGTAG 


CTATCTTTAA 


TTTGCCTGAT 


2160 


tt^sKataatc 


TCGCTATCAC 


TAGTTTCGTA 


AGGCGGAATT 


TAGAGGATAA 


ACTAACCACT 


2220 


aaIpgattgt 


CCCCACAAGA 


AGCTAATAAG 


CTTATCAAAG 


ATTTTTTGAG 


CAGCAACAAA 


2280 


GA)|rTGGTTG 


GAAAAACTTT 


AAACTTCAAT 


AAAGCTGTAG 


CTGACGCTAA 


AAACAGAGGC 


2340 


AATTATGATG 


AAGTGAAAAA 


AGCTCAGAAA 


GATCTTGAAA 


AATCTCTAAG 


GAAACGAGAG 


2400 


CATTTAGAGA. AAGAAGTAGA 


GAAAAAATTG 


GAGAGCAAAA 


GCGGCAACAA 


AAATAAAATG 


2460 


GAAGCAAAAG 


CTCAAGCTAA 


CAGCCAAAAA 


GATGAGATTT 


TTGCGTTGAT 


CAATAAAGAG 


2520 


GCTAATAGAG 


ACGCAAGAGC 


AATCGCTTAC 


GCTCAGAATC; 


TTAAAGGCAT 


CAAAAGGGAA 


2580 


TTGTCTGATA 


AACTTGAAAA 


TGTCAACAAG 


AATTTGAAAG 


ACTTTGATAA 


ATCTTTTGAT 


2640 


GAATTCAAAA 


ATGGCAAAAA 


TAAGGATTTC 


AGCAAGGCAG 


AAGAAACACT 


AAAAGCCCTT 


2700 


AAAGGTTCGG 


TGAAAGATTT 


AGGTATCAAT 


CCAGAATGGA 


TTTCAAAAGT 


TGAAAACCTT 


2760 


AATGCAGCTT 


TGAATGAATT 


CAAAAATGGC 


AAAAATAAGG 


ATTTCAGCAA 


GGTAACGCAA 


2820 


GCAAAAAGCG 


ACCTTGAAAA 


TTCCGTTAAA 


GATGTGATCA 


TCAATCAAAA 


GGTAACGGAT 


2880 



• 



73 



AAAGTTGATA 


ATCTCAATCA 


AGCGGTATCA 


GTGGCTAAAG 


CAACGGGTGA 


TTTCAGTAGG 


2940 


GTAGAGCAAG 


CGTTAGCCGA 


TCTCAAAAAT 


TTCTCAAAGG 


AGCAATTGGC 


CCAACAAGCT 


3000 


CAAAAAAATG 


AAAGTCTCAA 


TGCTAGAAAA 


AAATCTGAAA 


TATATCAATC 


CGTTAAGAAT 


3060 


GGTGTGAATG 


GAACCCTAGT 


cggtAatggg 


TTATCTCAAG 


CAGAAGCCAC 


AACTCTTTCT 


3120 


AAAAACTTTT 


CGGACATCAA 


GAAAGAGTTG 


AATGCAAAAC 


TTGGAAATTT 


CAATAACAAT 


3180 


AACAATAATG 


GACTCAAAAA 


CGAACCCATT 


TATGCTAAAG 


TTAATAAAAA 


GAAAGCAGGG 


3240 


CAAGCAGCTA 


GCCTTGAAGA 


ACCCATTTAC 


GCTCAAGTTG 


CTAAAAAGGT 


AAATGCAAAA 


3300 


ATTGACCGAC 


TCAATCAAAT 


AGCAAGTGGT 


TTGGGTGTTG 


TAGGGCAAGC 


AGCGGGCTTC 


3360 


CCTTTGAAAA 


GGCATGATAA 


AGTTGATGAT 


CTCAGTAAGG 


TAGGGCTTTC 


AAGGAATCAA 


3420 


GSVTTGGCTC 


AGAAAATTGA 


CAATCTCAAT 


CAAGCGGTAT 


CAGAAGCTAA 


AGCAGGTTTT 


3480 


miTGGCAATC 


TAGAGCAAAC 


GATAGACAAG 


CTCAAAGATT 


CTACAAAACA 


CAATCCCATG 


3540 


AgrCTATGGG 


TTGAAAGTGC 


AAAAAAAGTA 


CCTGCTAGTT 


TGTCAGCGAA 


ACTAGACAAT 


3600 


TACGCTACTA ACAGCCACAT 


ACGCATTAAT 


AGCAATATCA 


AAAATGGAGC 


AATCAATGAA 


3660 


AAAGCGACCG 


GCATGCTAAC 


GCAAAAAAAC 


CCTGAGTGGC 


TCAAGCTCGT 


GAATGATAAG 


3720 


AfckGTTGCGC 


ATAATGTAGG 


AAGCGTTCCT 


TTGTCAGAGT 


ATGATAAAAT 


TGGCTTCAAC 


3780 


cl^AAGAATA 


TGAAAGATTA 


TTCTGATTCG 


TTCAAGTTTT 


CCACCAAGTT 


GAACAATGCT 


3840 


G^S^AAAGACA 


CTAATTCTGG 


CTTTACGCAA 


TTTTTAACCA 


ATGCATTTT C 


TACAGCATCT 


3900 


TATTACTGCT 


TGGCGAGAGA 


AAATGCGGAG 


CATGGAATCA 


AGAACGTTAA 


TACAAAAGGT 


3960 


GGTTTCCAAA 


AATCTTAAAG 


GATTAAGGAA 


TACCAAAAAC 


GCAAAAACCA 


CCCCTTGCTA 


4020 


AAAGCGAGGG 


GTTTTTTAAT 


ACTCCTTAGC 


AGAAATCCCA 


ATCGTCTTTA 


GTATTTGGGA 


4080 


TGAATGCTAC 


CAATTCATGG 


TATCATATCC 


CCATACATTC 


GTATCTAGCG 


TAGGAAGTGT 


4140 


GCAAAGTTAC 


GCCTTTGGAG 


ATATGATGTG 


TGAGACCTGT 


AGGGAATGCG 


TTGGAGCTCA 


4200 


AACTCTGTAA 


AATCCCTATT 


ATAGGGACAC 


AGAGTGAGAA 


CCAAACTCTC 


CCTACGGGCA 


4260 


ACATCAGCCT 


AGGAAGCCCA 


ATCGTCTTTA 


GCGGTTGGGC 


ACTTCACCTT 


AAAATATCCC 


4320 


GACAGACACT 


AACGAAAGGC 


TTTGTTCTTT 


AAAGTCTGCA 


TGGATATTTC 


CTACCCCAAA 


43 80 


AAGACTTAAC 


CCTTTGCTTA 


AAATTAAGTT 


TGATTGTGCT 


AGTGGGTTCG 


TGCTATAGTG 


4440 
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CGAAAATTAA. 


TTAAGGGTTA 


TAAAGAGAGC 


ATAAACTAGA 


AAAAACAAGT 


AGCTATAACA 


4500 


AAGATCAAGT 


TCAAAAAATC 


ATAGAGCTTT 


TAGAGCAAAT 


TGATCGCGCT 


CTTAACCAAA 


4560 


GAAAAATCAG 


AAAAACCATA 


GG AATTATCA 


CACCTTATAA 


TGCCCAAAAA 


AGACGCTTGC 


4620 


GATCAGAAGT 


GGAAAAATAC 


GGCTfCAAGA 


ATTTTGATGA 


GCTCAAAATA 


GACACTGTGG 


4680 


ATGCCTTTCA 


AGGTGAAGAG 


GCAGATATTA 


* 

TTATTTATTC 


CACCGTGAAA 


ACTTGTGGTA 


4740 


ATCTTTCTTT 


CTTGCTAGAT 


TCTAAACGCT 


TGAATGTGGC 


TATTTCTAGG 


GCAAAAGAAA 


4800 


ATCTCATTTT 


TGTGGGTAAA 


AAGTCTTTCT 


TTGAGAATTT 


ATGAAGCGAT 


GAGAAGAATA 


4860 


TCTTTAGCGC 


TATTTTGCAA 


GTCTGTAGAT 


AGGTAATCTT 


TTCCAAAGAT 


AATCATTAGA 


4920 


CATTCTTCGC 


TTCAAAACGC 


TTTCATAAAT 


CTCTCTAAAG 


CGCTTTATAA 


TCAACACAAT 


4980 


AC(^TTATAG 


TGTGAGCTAT 


AGCCCCTTTT 


TGGGAATTGA 


GTTATTTTGA 


CTTTAAATTT 


5040 


TTi^TAGCGT 


TACAATTTGA 


GCCATTCTTT 


AGCTTGTTTT 


TCTAGCCAGA 


TCACATCGCC 


5100 


GCgrCCATGA 


AATTCCACTT 


TAGGGAATGC 


GTGTGCATTT 


TTTTTAAGGG 


CGTATTTTTG 


5160 


CT#AAATAT 


CCTACAATAG 


CATCGCCCGA 


ATGGATGAGT 


AGGGGGGGtG 


TTGAAAGGGC 


5220 


AApLTGCTCC 


ATAAAATAGC 


CCTCAATTTT 


TTGAGCGATT 


AAGGGAAAAT 


GCGTGCAACC 


5280 


taMataatc 


ACTTCGGGAA 


AATCTTTAAG 


GGAGTGAAAT 


AATAACGCAT 


GCAAGTTTCT 


5340 


aaSattcgc 

y ^ 


CCTCTAAAAT 


ACTTTCTTCA 


ATCAAAGGCA 


CAAAAAGAGA 


AGTGGCTAAA 


5400 


tg<^aaacat 


TCAAATAGCC 


TTGTTGTTTC 


AGGGCATTGT 


CATAAGCGTT 


GGATTGGATC 


5460 


gtcgcttttg 


TCCCTAGCAC 


TAAAATAGGG 


GCGTTTTTAT 


CTTTTACTTG 


TCGCTTGATC 


5520 


gctaaaatgc 


TTGGCTCAAT 


CACGCCCACA 


ATAGGGATTT 


TGGAATGCTT 


TTGCATCTCT 


5580 


TCTAAAGCTA 


GAGCGCTCGC 


TGTGTTGCAT 


GCCACAATCA 


ATAATTCAAT 


CTGGTGCGGT 


5640 


TTGAAAAAAT 


CCAAAGCCTC 


TAAGCCAAAT 


tgcttgatcg;- 


TAGTGGGGTC 


TTTAGTGCCA 


5700 


TAAGGCACTC 


TAGCCGTATC 


GCCATAATAG 


^ rr^/^ ^\ fi'iff *'T^ Try 

AivjAi i ICAl 


CAAATAATTG 


CGCTTTTAAA 


c tT r\ 

5760 


AGGCTTTTTA 


AAACGCTAAA 


CCCTCCCACA 


ccgctatcaa 


AAACGCCTAT 


TTTCATGACA 


5820 


CTTTTTTAAT 


TTAATGGGAT 


TAATTAGGGA 


TTTTATTTTT 


CATTCATTAA 


GTTTAAAAAT 


5880 


TCTTCATTGT 


CCTTAGTTTG 


TTGCATTTTA 




AGCTT 




5925 


(2) INFORMATION FOR SEQ ID NO : 5 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Met Thr Asn Glu Thr lie Asp Gin Gin Pro Gin Thr Glu Ala Ala Phe 
1 5 10 // 15 

Asn Pro Gin Gin Phe lie Asn Asn Leu Gin Val Ala Phe Leu Lys Val 

20 ' 25 3 0 

O Asp Asn Ala Val Ala Ser Tyr Asp Pro Asp Gin Lys Pro lie Val Asp 

%n 35 40 . 45 

m Lys Asn Asp Arg Asp Asn Arg Gin Ala Phe Glu Gly lie Ser Gin Leu 
□50 55 60 

Ui Arg Glu Glu Tyr Ser Asn Lys Ala lie Lys Asn Pro /Thr Lys Lys Asn 
2 65 70 75 80 

Q Gin Tyr Phe Ser Asp Phe lie Asn Lys Ser Asn Asp Leu lie Asn Lys 
Q 85 90 95 

Li Asp Asn Leu lie Asp Val Glu Ser Ser Thr Lys Ser Phe Gin Lys Phe 

^11 100 105 110 

Gly Asp Gin Arg Tyr Arg lie Phe Thr Ser Trp Val Ser His Gin Asn 
115 120 125 

Asp Pro Ser Lys lie Asn Thr Arg Ser lie Arg Asn Phe Met Glu Asn 
130 135 140 

lie lie Gin Pro Pro lie Leu Asp Asp Lys Glu Lys Ala Glu Phe Leu 
145 150 y: 155 160 

Lys Ser Ala Lys Gin Ser Phe Ala Gly lie lie lie Gly Ash Gin lie 

165 170 175 

Arg Thr Asp Gin Lys Phe Met Gly Val Phe Asp Glu Ser Leu Lys Glu 
180 185 190 

Arg Gin Glu Ala Glu Lys Asn Gly Glu Pro Thr Gly Gly Asp Trp Leu 
195 200 205 

Asp lie Phe Leu Ser Phe lie Phe Asp Lys Lys Gin Ser Ser Asp Val 
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210 . 215 220 

Lys Glu Ala lie Asn Gin Glu Pro Val Pro His Val Gin Pro Asp lie 
225 230 235 240 

Ala Thr Thr Thr Thr Asp lie Gin Gly Leu Pro Pro Glu Ala Arg Asp 

245 * 250 255 

Leu Leu Asp Glu Arg Gly Asn Phe Ser 'Lys Phe Thr Leu Gly Asp Met 
260 265 270 

Glu Met Leu Asp Val Glu Gly Val Ala Asp lie Asp Pro Asn Tyr Lys 
275 280 285 

Phe Asn Gin Leu Leu lie His Asn Asn Ala Leu Ser Ser Val Leu Met 
290 295 300 

Gly Ser His Asn Gly lie Glu Pro Glu Lys Val Ser Leu Leu Tyr Gly 
305 310 315 320 

Gly Asn Gly Gly Pro Gly Ala. Arg His Asp Trp Asn Ala Thr Val Gly 

325 330 335 

Tyr Lys Asp Gin Gin Gly Asn Asn Val Ala Thr lie lie Asn Val His 
340 345 / 350 

Met Lys Asn Gly Ser Gly Leu Val lie Ala Gly Gly Glu Lys Gly lie 
355 360 365 

Asn Asn Pro Ser Phe Tyr Leu Tyr Lys Glu Asp Gin Leu Thr Gly Ser 
370 375 380 

Gin Arg Ala Leu Ser Gin Glu Glu lie Gin Asn Lys lie Asp Phe Met 
385 390 395 400 

Glu Phe Leu Ala Gin Asn Asn Ala Lys Leu Asp Asn Leu Ser Glu Lys 

405 410 415 

Glu Lys Glu Lys Phe Arg Thr Glu lie Lys Asp Phe Gin Lys Asp Ser 
420 425 430 

Lys Ala Tyr Leu Asp Ala Leu Gly Asn Asp Arg lie Ala Phe Val Ser 
435 .440 445 

Lys Lys Asp Thr Lys His Ser Ala Leu lie Thr Glu Phe Gly Asn Gly 
450 455 460 

Asp Leu Ser Tyr Thr Leu Lys Asp Tyr Gly Lys Lys Ala Asp Lys Ala 
465 470 475 480 

Leu Asp Arg Glu Lys Asn Val Thr Leu Gin Gly Ser Leu Lys His Asp 

485 490 495 



11 

Gly Val Met Phe Val Asp Tyr Ser Asn Phe Lys Tyr Thr Asn Ala Ser 
500 505 510 

Lys Asn Pro Asn Lys Gly Val Gly Val Thr Asn Gly Val Ser His Leu 
515 520 525 

Glu Val Gly Phe Asn Lys Val Ala lie Phe Asn Leu Pro Asp Leu Asn 
530 535 . 540 

Asn Leu Ala lie Thr Ser Phe Val Arg Arg Asn Leu Glu Asp Lys Leu 
545 550 555 560 

Thr Thr Lys Gly Leu Ser Pro Gin Glu Ala Asn Lys Leu lie Lys Asp 

565 570 /; 575 

Phe Leu Ser Ser Asn Lys Glu Leu Val Gly Lys Thr Leu Asn Phe Asn 
580 ^ 585 590 

Lys Ala Val Ala Asp Ala Lys Asn Thr Gly Asn Tyr Asp Glu Val Lys 

595. 600 605 

Lys Ala Gin Lys Asp Leu Glu Lys Ser Leu Arg Lys Arg Glu His Leu 
610 615 620 

Glu Lys Glu Val Glu Lys Lys Leu Glu Ser Lys Ser /Gly Asn Lys Asn 
625 630 635 640 

Lys Met Glu Ala Lys Ala Gin Ala Asn Ser Gin Lys Asp Glu lie Phe 

645 650 655 

Ala Leu lie Asn Lys Glu Ala Asn Arg Asp Ala Arg Ala lie Ala Tyr 
660 665 670 

Ala Gin Asn Leu Lys Gly lie Lys Arg Glu Leu Ser Asp Lys Leu Glu 
675 680 685 

Asn Val Asn Lys Asn Leu Lys Asp Phe Asp Lys Ser Phe Asp Glu Phe 
690 695 700 

Lys Asn Gly Lys Asn Lys Asp Phe Ser Lys Ala Glu Glu Thr Leu Lys 
705 710 715 720 

Ala Leu Lys Gly Ser Val Lys Asp Leu Gly lie Asn Pro Glu Trp lie 

725 730 735 

Ser Lys Val Glu Asn Leu Asn Ala Ala Leu Asn Glu Phe Lys Asn Gly 
740 745 750 

Lys Asn Lys Asp Phe Ser Lys Val Thr Gin Ala Lys Ser Asp Leu Glu 
755 760 765 

Asn Ser Val Lys Asp Val lie lie Asn Gin Lys Val Thr Asp Lys Val 
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770 775 780 

Asp Asn Leu Asn Gin Ala Val Ser Val Ala Lys Ala Thr Gly Asp Phe 
785 790 795 800 

Ser Arg Val Glu Gin Ala Leu Ala Asp Leu Lys Asn Phe Ser Lys Glu 

805 * 810 815 

Gin Leu Ala Gin Gin Ala Gin Lys Asn 'Glu Ser Leu Asn Ala Arg Lys 
820 825 830 

Lys Ser Glu lie Tyr Gin Ser Val Lys Asn Gly Val Asn Gly Thr Leu 
835 840 845 

Val Gly Asn Gly Leu Ser Gin Ala Glu Ala Thr Thr Leu Ser Lys Asn 
850 855 860 

Phe Ser Asp lie Lys Lys Glu Leu Asn Ala Lys Leu Gly Asn Phe Asn 
865 870 875 880 

Asn Asn Asn Asn Asn Gly Leu Lys Asn Glu Pro lie Tyr Ala Lys Val 

885 890 895 

Asn Lys Lys Lys Ala Gly Gin Ala Ala Ser Leu Glu Glu Pro lie Tyr 
900 905 : 910 

Ala Gin Val Ala Lys Lys Val Asn Ala Lys lie Asp Arg Leu Asn Gin 
915 920 925 

lie Ala Ser Gly Leu Gly Val Val Gly Gin Ala Ala Gly Phe Pro Leu 
930 . 935 940 

Lys Arg His Asp Lys Val Asp Asp Leu Ser Lys Val Gly Leu Ser Arg 
945 950 955 960 

Asn Gin Glu Leu Ala Gin Lys lie Asp Asn Leu Asn Gin Ala Val Ser 

965 970 975 

Glu Ala Lys Ala Gly Phe Phe Gly Asn Leu Glu Gin Thr Tie Asp Lys 
980 985 990 

Leu Lys Asp Ser Thr Lys His Asn Pro Met Asn Leu Trp Val Glu Ser 
995 1000 1005 

Ala Lys Lys Val Pro Ala Ser Leu Ser Ala Lys Leu Asp Asn Tyr Ala 
1010 1015 1020 

Thr Asn Ser His lie Arg lie Asn Ser Asn lie Lys Asn Gly Ala lie 
1025 1030 1035 1040 

Asn Glu Lys Ala Thr Gly Met Leu Thr Gin Lys Asn Pro Glu Trp Leu 

1045 1050 1055 
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Lys Leu Val Asn Asp Lys lie Val Ala His Asn Val Gly Ser Val Pro 
1060 1065 1070 

Leu Ser Glu Tyr Asp Lys lie Gly Phe Asn Gin Lys Asn Met Lys Asp 
1075 1080 1085 

Tyr Ser Asp Ser Phe Lys Phe Ser Thr Lys Leu Asn Asn Ala Val Lys 
1090 1095 . 1100 

Asp Thr Asn Ser Gly Phe Thr Gin Phe Leu Thr Asn Ala Phe Ser Thr 
1105 1110 1115 1120 

Ala Ser Tyr Tyr Cys Leu Ala Arg Glu Asn Ala Glu His Gly lie Lys 

1125 1130 // 1135 

Asn Val Asn Thr Lys Gly Gly Phe Gin Lys Ser 
1140 1145 

(2 B INFORMATION FOR SEQ ID NO: 6: 

M (i) SEQUENCE CHARACTERISTICS: 

m (A) LENGTH: 546 amino acids 

Q (B) TYPE: amino acid 

>= (C) STRANDEDNESS : single 

iTJ (D) TOPOLOGY: linear 

7" (ii) MOLECULE TYPE: protein 



l^'(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Ala Lys Glu lie Lys Phe Ser Asp Ser Ala Arg Asn Leu Leu Phe 
^0 1 5 10 15 . 

Glu Gly Val Arg Gin Leu His Asp Ala Val Lys Val Thr Met Gly Pro 
20 25 30 

Arg Gly Arg Asn Val Leu lie Gin Lys Ser Tyr Gly Ala Pro Ser lie 
35 40 45 

Thr Lys Asp Gly Val Ser Val Ala Lys Glu lie Glu Leu Ser Cys Pro 
50 55 60 

Val Ala Asn Met Gly Ala Gin Leu Val Lys Glu Val Ala Ser Lys Thr 
65 70 75 80 

Ala Asp Ala Ala Gly Asp Gly Thr . Thr Thr Ala Thr Val Leu Ala Tyr 

85 90 95 

Ser lie Phe Lys Glu Gly Leu Arg Asn lie Thr Ala Gly Ala Asn Pro 
100 105 110 
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lie Glu Val Lys Arg Gly Met Asp Lys Ala Ala Glu Ala lie lie Asn 
115 120 125 

Glu Leu Lys Lys Ala Ser Lys Lys Val Gly Gly Lys Glu Glu lie Thr 
130 135 140 

Gin Val Ala Thr lie Ser Ala Asn Ser Asp His Asn lie Gly Lys Leu 
145 150 . 155 160 

lie Ala Asp Ala Met Glu Lys Val Gly Lys Asp Gly Val lie Thr Val 

165 170 175 

Glu Glu Ala Lys Gly lie Glu Asp Glu Leu Asp Val Val Glu Gly Met 
180 185 190 

Gin Phe Asp Arg Gly Tyr Leu Ser Pro Tyr Phe Val Thr Asn Ala Glu 
195 200 205 

Lys Met Thr Ala Gin Leu Asp Asn Ala Tyr lie Leu Leu Thr Asp Lys 
210 215 220 

Lys lie Ser Ser Met Lys Asp lie Leu Pro Leu Leu Glu Lys Thr Met 
225 230 235 240 

Lys Glu Gly Lys Pro Leu Leu Ile- Ile Ala Glu Asp/ lie Glu Gly Glu 

245 250 255 

Ala Leu Thr Thr Leu Val Val Asn Lys Leu Arg Gly Val Leu Asn lie 
260 265 270 

Ala Ala Val Lys Ala Pro Gly Phe Gly Asp Arg Arg Lys Glu Met Leu 
275 280 285 

Lys Asp lie Ala lie Leu Thr Gly Gly Gin Val lie Ser Glu Glu Leu 
290 295 300 

Gly Leu Ser Leu Glu Asn Ala Glu Val Glu Phe Leu Gly Lys Ala Gly 
305 310 315 320 

Arg lie Val lie Asp Lys Asp Asn Thr Thr lie Val Asp Gly Lys Gly 

325 330 335 

His Ser Asp Asp Val Lys Asp Arg Val Ala Gin lie Lys Thr Gin lie 
340 345 350 

Ala Ser Thr Thr Ser Asp Tyr Asp Lys Glu Lys Leu Gin Glu Arg Leu 
355 360 365 

Ala Lys Leu Ser Gly Gly Val Ala Val lie Lys Val Gly Ala Ala Ser 
370 375 380 

Glu Val Glu Met Lys Glu Lys Lys Asp Arg Val Asp Asp Ala Leu Ser 
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385 390 

Ala Thr Lys Ala Ala Val 

405 

Ala Leu lie Arg Ala Ala 
420 

Glu Lys Val Gly Tyr Glu 
435 

Ala Gin lie Ala lie Asn 
450 

Glu Val Glu Lys His Glu 
465 470 

Lys Tyr Val Asp Met Phe 
O 485 

yj Glu Arg lie Ala Leu Gin 

m 500 

2 Thr Thr Glu Ala Thr Val 
ki 515 

Ala Met Pro Asp Met Gly 

U 530 

J1 Met Met 



(2lM INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDECi>IESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AAGCTTGCTG TCATGATCAC AAAAAACACT AAAAAACATT ATTATTAAGG ATACAAAATG 60 

GCAAAAGAAA TCAAATTTTC AGATAGTGCG AGAAACCTTT TATTTGAAGG CGTGAGGCAA 120 

CTCCATGACG CTGTCAAAGT AACCATGGGG CCAAGAGGCA GGAATGTATT GATCCAAAAA 180 



395 

Glu Glu Gly He Val 
410 

Gin Lys Val His Leu 
425 

He He Met Arg Ala 
440 

Ala Gly Tyr Asp Gly 
455 

Gly His Phe Gly Phe 

475 

Lys Glu Gly He He 
490 

Asn Ala Val Ser Val 
505 

His Glu He Lys Glu 
520 

GLy Met Gly Gly Met 
535 
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He Gly Gly Gly Ala 
415 

Asn Leu His Asp Asp 
430 

He Lys Al a Pro Leu 
445 

Gly Val Val Val Asn 
460 

Asn Ala Ser Asn Gly 

480 

Asp Pro Leu Lys Val 
495 

Ser Ser Leu Leu Leu 
510 

Glu Lys Ala Thr Pro 
/525 

Gly Gly Met Gly Gly 
540 
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AGCTATGGCG CTCCAAGCAT CACCAAAGAC GGCGTGAGCG TGGCTAAAGA GATTGAATTA 24 0 

AGTTGCCCAG TAGCTAACAT GGGCGCTCAA CTCGTTAAAG AAGTAGCGAG CAAAACCGCT 3 00 

GATGCTGCCG GCGATGGCAC GACCACAGCG ACCGTGCTAG CTTATAGCAT TTTTAAAGAA 3 60 

GGTTTGAGGA ATATCACGGC TGGGGCTAAC CCTATTGAAG TGAAACGAGG CATGGATAAA 4 20 

GCTGCTGAAG CGATCATTAA TGAGCTTAAA AAAGCGAGCZA AAAAAGTAGG CGGTAAAGAA 4 80 

GAAATCACCC AAGTGGCGAC CATTTCTGCA AACTCCGATC ACAATATCGG GAAACTCATC 54 0 

GCTGACGCTA TGGAAAAAGT GGGTAAAGAC GGCGTGATCA CCGTTGAGGA AGCTAAGGGC 600 

ATTGAAGATG AATTGGATGT CGTAGAAGGC ATGCAATTTG ATAGAGGCTA CCTCTCCCCT 6 60 

TATTTTGTAA CGAACGCTGA GAAAATGACC GCTCAATTGG ATAATGCTTA CATCCTTTTA* 720 

ACGGATAAAA AAATCTCTAG CATGAAAGAC ATTCTCCCGC TACTAGAAAA AACCATGAAA 78 0 

G^GGGCAAAC CGCTTTTAAT CATCGCTGAA GACATTGAGG GCGAAGCTTT AACGACTCTA 84 0 

G^GTGAATA AATTAAGAGG CGTGTTGAAT ATCGCAGCGG TTAAAGCTCC AGGCTTTGGG 900 

GAPAGAAGAA AAGAAATGCT CAAAGACATC GCTATTTTAA CCGGCGGTCA AGTCATTAGC 960 

GAAGAATTGG GCTTGAGTCT AGAAAACGCT GAAGTGGAGT TTTTAGGCAA AGCTGGAAGG 1020 

Att'GTGATTG ACAAAGACAA CACCACGATC GTAGATGGCA AAGGCCATAG CGATGATGTT 1080 

A^GACAGAG TCGCGCAGAT CAAAACCCAA ATTGCAAGTA CGACAAGCGA TTATGACAAA 114 0 

G^gVAAATTGC AAGAAAGATT GGCTAAACTC TCTGGCGGTG TGGCTGTGAT TAAAGTGGGC 1200 

GCTGCGAGTG AAGTGGAAAT GAAAGAGAAA AAAGACCGGG TGGATGACGC GTTGAGCGCG 1260 

ACTAAAGCGG CGGTTGAAGA AGGCATTGTG ATTGGTGGCG GTGCGGCTCT CATTCGCGCG 13 20 

GCTCAAAAAG TGCATTTGAA TTTGCACGAT GATGAAAAAG TGGGCTATGA. AATCATCATG 13 80 

CGCGCCATTA AAGCCCCATT AGCTCAAATC GCTATCAACG' CTGGTTATGA TGGCGGTGTG 144 0 

GTCGTGAATG AAGTAGAAAA ACACGAAGGG CATTTTGGTT TTAACGCTAG QAATGGCAAG 1500 

TATGTGGATA TGTTTAAAGA AGGCATTATT GACCCCTTAA AAGTAGAAAG GATCGCtCTA 1560 

CAAAATGCGG TTTCGGTTTC AAGCCTGCTT TTAACCACAG AAGCCACCGT GCATGAAATC 162 0 

AAAGAAGAAA AAGCGACTCC GGCAATGCCT GATATGGGTG GCATGGGCGG TATGGGAGGC 168 0 

ATGGGCGGCA TGATGTAAGC CCGCTTGCTT TTTAGTATAA TCTGCTTTTA AAATCCCTTC 174 0 
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TCTAAATCCC CCCCTTTCTA AAATCTCTTT TTTGGGGGGG TGCTTTGATA AAACCGCTCG 
CTTGTAAAAA CATGCAACAA AAAATCTCTG TTAAGCTT 



1800 
1838 
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